Picture for Song Han

Song Han

University of Connecticut

LongLive-RAG: A General Retrieval-Augmented Framework for Long Video Generation

Add code
Jun 01, 2026
Viaarxiv icon

Grounded 3D-Aware Spatial Vision-Language Modeling

Add code
May 28, 2026
Viaarxiv icon

JetViT: Efficient High-Resolution Vision Transformer with Post-Training Attention Search

Add code
May 26, 2026
Viaarxiv icon

Fast-dDrive: Efficient Block-Diffusion VLM for Autonomous Driving

Add code
May 25, 2026
Viaarxiv icon

Hide to Guide: Learning via Semantic Masking

Add code
May 24, 2026
Viaarxiv icon

LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation

Add code
May 19, 2026
Viaarxiv icon

SANA-WM: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer

Add code
May 14, 2026
Viaarxiv icon

AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation

Add code
May 13, 2026
Viaarxiv icon

Nemotron 3 Nano Omni: Efficient and Open Multimodal Intelligence

Add code
Apr 27, 2026
Viaarxiv icon

Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation

Add code
Apr 14, 2026
Viaarxiv icon